Skip to content

Conversation

@arshidkv12
Copy link

  • Ensure php_url_parse_ex2 preserves UTF-8 characters in all URI components:
    host, path, query, fragment, and user info.
  • Replace previous byte-wise handling that caused mojibake on non-ASCII inputs.
  • Works correctly on macOS and other platforms with Unicode URLs.
  • Maintains compatibility with ASCII-only URLs.

@arshidkv12
Copy link
Author

/* }}} */

static void php_replace_controlchars(char *str, size_t len)
static void php_str_to_utf8(const char *str, size_t len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? You allocate a new string, but at the end of the function you throw it away?

Copy link
Author

@arshidkv12 arshidkv12 Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It allocate a zend_string *utf8 with zend_string_alloc(len * 4, 0).

    • This is intended to hold the UTF-8–encoded version of the input string.
  2. It fills UTF-8 with either the original ASCII bytes or the UTF-8 replacement character for non-ASCII bytes.

  3. At the end, immediately call zend_string_release(utf8);

  4. This frees the memory just allocated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that. But how does this solve the problem?
You never use the newly created string, and you now no longer replace control characters.
Furthermore, what's the idea behind using the replacement character? This should not be necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: After removing php_replace_controlchars, the code works correctly without any issues.

s++;
}
zend_string *utf8;
utf8 = zend_string_alloc(len * 4, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never open-code a multiplication inside an allocation size argument, this can cause security issues (integer overflow -> buffer overflow)
Use zend_string_safe_alloc instead.

}
s++;
}
zend_string *utf8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should merge the declaration and assignment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I changed it. zend_string *utf8 = zend_string_safe_alloc(len, 4, 0, 0);

@nielsdos
Copy link
Member

You can't just remove this, this was added in 184323c to prevent CR/LF issues in URLs. The proper solution is likely to replace the call to iscntrl to an explicit range check

@arshidkv12
Copy link
Author

Thank you. I think replacing \r and \n directly is fine.

Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid unnecessary whitespace changes and keep indentation using tabs

@arshidkv12
Copy link
Author

arshidkv12 commented Aug 17, 2025

I removed unnecessary whitespace. Please check it.

@arshidkv12
Copy link
Author

@nielsdos
Copy link
Member

https://github.com/openswoole/ext-openswoole/blob/525c247b0d95b69adeec2f4d98ad8809ae45c6fb/ext-src/php_swoole_http.h#L285 Swoole is checking \r \n and \0 only.

It's better to keep compatibility with what PHP does now already. Control chars are not really valid anyway IIRC.


I also see some usages of isalpha etc, while they do not cause problems in your test case, they could in general. So all the "is*" functions need to be expanded probably.

@arshidkv12 arshidkv12 closed this by deleting the head repository Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants